Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

[Numpy] Fix AWS Batch + Add Docker Support #1302

Merged
merged 11 commits into from
Aug 20, 2020
Merged

Conversation

sxjscience
Copy link
Member

@sxjscience sxjscience commented Aug 18, 2020

  • Fix batch. We will need to get the name of the log stream from describeJobsResponse.
  • Add docker support. Currently, by launching the docker, it will launch a JupyterLab development environment with GluonNLP installed. The docker image has been uploaded to docker hub: https://hub.docker.com/repository/docker/gluonai/gluon-nlp
docker pull gluonai/gluon-nlp:v1.0.0
docker run --gpus all --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 gluonai/gluon-nlp:v1.0.0

@dmlc/gluon-nlp-committers

Should solve #1243 and #1139

@sxjscience sxjscience changed the title [Numpy] Fix Batch + Add Docker Support [Numpy] Fix AWS Batch + Add Docker Support Aug 18, 2020
@codecov
Copy link

codecov bot commented Aug 18, 2020

Codecov Report

Merging #1302 into master will increase coverage by 0.15%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1302      +/-   ##
==========================================
+ Coverage   84.14%   84.30%   +0.15%     
==========================================
  Files          42       42              
  Lines        6397     6397              
==========================================
+ Hits         5383     5393      +10     
+ Misses       1014     1004      -10     
Impacted Files Coverage Δ
src/gluonnlp/__init__.py 100.00% <100.00%> (ø)
src/gluonnlp/utils/misc.py 50.63% <0.00%> (-1.27%) ⬇️
src/gluonnlp/data/loading.py 83.39% <0.00%> (+5.28%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 32e87d4...dbc34be. Read the comment docs.

same "printed page" as the copyright notice for easier
identification within third-party archives.

Copyright [yyyy] [name of copyright owner]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me get back to you what we update this to.

Comment on lines 83 to 94
RUN wget -c https://www.openssl.org/source/openssl-${OPENSSL_VERSION}.tar.gz \
&& apt-get update \
&& apt remove -y --purge openssl \
&& rm -rf /usr/include/openssl \
&& apt-get install -y \
ca-certificates \
&& tar -xzvf openssl-${OPENSSL_VERSION}.tar.gz \
&& cd openssl-${OPENSSL_VERSION} \
&& ./config && make -j $(nproc) && make test \
&& make install \
&& ldconfig \
&& cd .. && rm -rf openssl-*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean why should we install openssl?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why compile openssl from source?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's because we may want to have a customized version of openssl. Also, there might be security issues so that we may not install from another resource. That's the solution adopted in DLC: https://github.com/aws/deep-learning-containers/blob/95e4d9c9cba8b6dffec61637452b4bbd46bb59bd/mxnet/training/docker/1.6.0/py3/cu101/Dockerfile.gpu#L113-L124

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ubuntu manages the security fixes for you. No need to compile from source. I recommend you remove this part.

@@ -147,10 +148,10 @@ def main():
sys.exit(status == 'FAILED')

elif status == 'RUNNING':
logStreamName = getLogStream(logGroupName, jobName, jobId)
logStreamName = describeJobsResponse['jobs'][0]['container']['logStreamName']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good Job!

Add LICESE + Examples for batch

Update docker image

update

Update README.md

Update README.md

Update ubuntu18.04-devel.Dockerfile

Update ubuntu18.04-devel.Dockerfile

Update ubuntu18.04-devel.Dockerfile

update

Update ubuntu18.04-devel-gpu.Dockerfile

fix

Update ubuntu18.04-devel-gpu.Dockerfile

Update ubuntu18.04-devel-gpu.Dockerfile

Update submit-job.py

Update ubuntu18.04-devel-gpu.Dockerfile

Update ubuntu18.04-devel-gpu.Dockerfile

update

Update ubuntu18.04-devel-gpu.Dockerfile

Update ubuntu18.04-devel-gpu.Dockerfile

Update ubuntu18.04-devel-gpu.Dockerfile

update

update

Update submit-job.py

Update submit-job.py

Update ubuntu18.04-devel-gpu.Dockerfile

Update ubuntu18.04-devel-gpu.Dockerfile

try to fix

fix batch

Update ubuntu18.04-devel-gpu.Dockerfile

Update ubuntu18.04-devel-gpu.Dockerfile

Update ubuntu18.04-devel-gpu.Dockerfile

Update submit-job.py

Update ubuntu18.04-devel-gpu.Dockerfile

simplify bert test

add files

Update ubuntu18.04-devel-gpu.Dockerfile

Update ubuntu18.04-devel-gpu.Dockerfile

Update ubuntu18.04-devel-gpu.Dockerfile

Update ubuntu18.04-devel-gpu.Dockerfile

Update ubuntu18.04-devel-gpu.Dockerfile

Update ubuntu18.04-devel-gpu.Dockerfile

Update ubuntu18.04-devel-gpu.Dockerfile

Update ubuntu18.04-devel-gpu.Dockerfile

Update ubuntu18.04-devel-gpu.Dockerfile

Update ubuntu18.04-devel-gpu.Dockerfile

Update ubuntu18.04-devel-gpu.Dockerfile

fix

Update ubuntu18.04-devel-gpu.Dockerfile
@sxjscience
Copy link
Member Author

@leezu I tried to compile with the latest MXNet and install horovod via Haibin's branch. However, I'm seeing this error message:

/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crti.o: In function `_init':
(.init+0x7): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `__gmon_start__'
CMakeFiles/nnvm.dir/3rdparty/tvm/nnvm/src/pass/print_graph_ir.cc.o: In function `std::_Function_base::_Base_manager<nnvm::pass::PrintGraphIR_(nnvm::Graph, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::ostream&)::{lambda(unsigned int, std::ostream&)#2}>::_M_manager(std::_Any_data&, std::_Function_base::_Base_manager<nnvm::pass::PrintGraphIR_(nnvm::Graph, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::ostream&)::{lambda(unsigned int, std::ostream&)#2}> const&, std::_Manager_operation)':
print_graph_ir.cc:(.text+0x3bb): relocation truncated to fit: R_X86_64_PC32 against `.data.rel.ro'
CMakeFiles/nnvm.dir/3rdparty/tvm/nnvm/src/pass/print_graph_ir.cc.o: In function `std::_Function_base::_Base_manager<nnvm::pass::PrintGraphIR_(nnvm::Graph, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::ostream&)::{lambda(unsigned int, std::ostream&)#1}>::_M_manager(std::_Any_data&, std::_Function_base::_Base_manager<nnvm::pass::PrintGraphIR_(nnvm::Graph, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::ostream&)::{lambda(unsigned int, std::ostream&)#1}> const&, std::_Manager_operation)':
print_graph_ir.cc:(.text+0x59b): relocation truncated to fit: R_X86_64_PC32 against `.data.rel.ro'
CMakeFiles/nnvm.dir/3rdparty/tvm/nnvm/src/pass/print_graph_ir.cc.o: In function `nnvm::pass::GetVectorPrinter(nnvm::Graph const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)':
print_graph_ir.cc:(.text+0x6d8): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `typeinfo name for std::vector<nnvm::TShape, std::allocator<nnvm::TShape> >' defined in .rodata._ZTSSt6vectorIN4nnvm6TShapeESaIS1_EE[_ZTSSt6vectorIN4nnvm6TShapeESaIS1_EE] section in CMakeFiles/nnvm.dir/3rdparty/tvm/nnvm/src/pass/infer_shape_type.cc.o
print_graph_ir.cc:(.text+0x708): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `typeinfo name for std::vector<int, std::allocator<int> >' defined in .rodata._ZTSSt6vectorIiSaIiEE[_ZTSSt6vectorIiSaIiEE] section in CMakeFiles/nnvm.dir/3rdparty/tvm/nnvm/src/pass/infer_shape_type.cc.o
print_graph_ir.cc:(.text+0x72b): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `typeinfo name for std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >' defined in .rodata._ZTSSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS5_EE[_ZTSSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS5_EE] section in CMakeFiles/nnvm.dir/3rdparty/tvm/nnvm/src/pass/print_graph_ir.cc.o
print_graph_ir.cc:(.text+0x75b): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vtable for std::basic_ios<char, std::char_traits<char> >@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/7/libstdc++.so
print_graph_ir.cc:(.text+0x7a0): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `VTT for std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >@@GLIBCXX_3.4.21' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/7/libstdc++.so
print_graph_ir.cc:(.text+0x7c5): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vtable for std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >@@GLIBCXX_3.4.21' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/7/libstdc++.so
print_graph_ir.cc:(.text+0x7e3): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against symbol `vtable for std::basic_streambuf<char, std::char_traits<char> >@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/7/libstdc++.so
print_graph_ir.cc:(.text+0x816): additional relocation overflows omitted from the output

Thus, I reverted to use the mxnet wheel package instead and commented out the codes related to horovod. Would you approve it if you feel that it's appropriate?

Comment on lines +57 to +63
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported by contacting the project team in GitHub issues/pull requests
by mentioning @dmlc/gluon-nlp-committers. All
complaints will be reviewed and investigated and will result in a response that
is deemed necessary and appropriate to the circumstances. The project team is
obligated to maintain confidentiality with regard to the reporter of an incident.
Further details of specific enforcement policies may be posted separately.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommending to open a Github issue may not meet the confidentiality you promise here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we create an issue and revise it in a later PR? Or I may remove this CODE_OF_CONDUCT for now.

&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

# Install CMake 3.13.3. The default in Ubuntu 18.04 is cmake 3.10.2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pip install cmake will be easier ;)

@sxjscience
Copy link
Member Author

Horovod should have been added to the dockerfile.

Copy link
Member

@zheyuye zheyuye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm that docker image can be built and has been uploaded to Docker hub, and all unittests in Horovod related to Mxnet has been passed.

@sxjscience sxjscience merged commit d8b68c6 into dmlc:master Aug 20, 2020
zheyuye added a commit to zheyuye/gluon-nlp that referenced this pull request Aug 21, 2020
commit d8b68c6
Author: Xingjian Shi <[email protected]>
Date:   Thu Aug 20 08:47:56 2020 -0700

    [Numpy] Fix AWS Batch + Add Docker Support (dmlc#1302)

    * Update submit-job.py

    Add LICESE + Examples for batch

    Update docker image

    update

    Update README.md

    Update README.md

    Update ubuntu18.04-devel.Dockerfile

    Update ubuntu18.04-devel.Dockerfile

    Update ubuntu18.04-devel.Dockerfile

    update

    Update ubuntu18.04-devel-gpu.Dockerfile

    fix

    Update ubuntu18.04-devel-gpu.Dockerfile

    Update ubuntu18.04-devel-gpu.Dockerfile

    Update submit-job.py

    Update ubuntu18.04-devel-gpu.Dockerfile

    Update ubuntu18.04-devel-gpu.Dockerfile

    update

    Update ubuntu18.04-devel-gpu.Dockerfile

    Update ubuntu18.04-devel-gpu.Dockerfile

    Update ubuntu18.04-devel-gpu.Dockerfile

    update

    update

    Update submit-job.py

    Update submit-job.py

    Update ubuntu18.04-devel-gpu.Dockerfile

    Update ubuntu18.04-devel-gpu.Dockerfile

    try to fix

    fix batch

    Update ubuntu18.04-devel-gpu.Dockerfile

    Update ubuntu18.04-devel-gpu.Dockerfile

    Update ubuntu18.04-devel-gpu.Dockerfile

    Update submit-job.py

    Update ubuntu18.04-devel-gpu.Dockerfile

    simplify bert test

    add files

    Update ubuntu18.04-devel-gpu.Dockerfile

    Update ubuntu18.04-devel-gpu.Dockerfile

    Update ubuntu18.04-devel-gpu.Dockerfile

    Update ubuntu18.04-devel-gpu.Dockerfile

    Update ubuntu18.04-devel-gpu.Dockerfile

    Update ubuntu18.04-devel-gpu.Dockerfile

    Update ubuntu18.04-devel-gpu.Dockerfile

    Update ubuntu18.04-devel-gpu.Dockerfile

    Update ubuntu18.04-devel-gpu.Dockerfile

    Update ubuntu18.04-devel-gpu.Dockerfile

    Update ubuntu18.04-devel-gpu.Dockerfile

    fix

    Update ubuntu18.04-devel-gpu.Dockerfile

    * Update ubuntu18.04-devel-gpu.Dockerfile

    * try to add back mxnet support

    * Update ubuntu18.04-devel-gpu.Dockerfile

    * Update ubuntu18.04-devel-gpu.Dockerfile

    * update

    * Update ubuntu18.04-devel-gpu.Dockerfile

    * Update ubuntu18.04-devel-gpu.Dockerfile

    * Update ubuntu18.04-devel-gpu.Dockerfile

    * fix issues

    * update

commit 6ae558e
Author: ht <[email protected]>
Date:   Thu Aug 20 23:47:30 2020 +0800

    [FEATURE]Horovod support for training transformer (PART 2) (dmlc#1301)

    * set default shuffle=True for boundedbudgetsampler

    * fix

    * fix log condition

    * use horovod to train transformer

    * fix

    * add mirror wmt dataset

    * fix

    * rename wmt.txt to wmt.json and remove part of urls

    * fix

    * tuning params

    * use get_repo_url()

    * update average checkpoint cli

    * paste result of transformer large

    * fix

    * fix logging in train_transformer

    * fix

    * fix

    * fix

    * add transformer base config

    * fix

    * change to wmt14/full

    * print more sacrebleu info

    * fix

    * add test for num_parts and update behavior of boundedbudgetsampler with even_size

    * fix

    * fix

    * fix

    * fix logging when using horovd

    * udpate doc of train transformer

    * add test case for fail downloading

    * add a ShardedIterator

    * fix

    * fix

    * fix

    * change mpirun to horovodrun

    * make the horovod command complete

    * use print(sampler) to cover the codes of __repr__ func

    * empty commit

    * add test case test_sharded_iterator_even_size

    Co-authored-by: Hu <[email protected]>
zheyuye added a commit to zheyuye/gluon-nlp that referenced this pull request Aug 21, 2020
commit 7525618
Author: ZheyuYe <[email protected]>
Date:   Fri Aug 21 11:25:38 2020 +0800

    Squashed commit of the following:

    commit d8b68c6
    Author: Xingjian Shi <[email protected]>
    Date:   Thu Aug 20 08:47:56 2020 -0700

        [Numpy] Fix AWS Batch + Add Docker Support (dmlc#1302)

        * Update submit-job.py

        Add LICESE + Examples for batch

        Update docker image

        update

        Update README.md

        Update README.md

        Update ubuntu18.04-devel.Dockerfile

        Update ubuntu18.04-devel.Dockerfile

        Update ubuntu18.04-devel.Dockerfile

        update

        Update ubuntu18.04-devel-gpu.Dockerfile

        fix

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update submit-job.py

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        update

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        update

        update

        Update submit-job.py

        Update submit-job.py

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        try to fix

        fix batch

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update submit-job.py

        Update ubuntu18.04-devel-gpu.Dockerfile

        simplify bert test

        add files

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        fix

        Update ubuntu18.04-devel-gpu.Dockerfile

        * Update ubuntu18.04-devel-gpu.Dockerfile

        * try to add back mxnet support

        * Update ubuntu18.04-devel-gpu.Dockerfile

        * Update ubuntu18.04-devel-gpu.Dockerfile

        * update

        * Update ubuntu18.04-devel-gpu.Dockerfile

        * Update ubuntu18.04-devel-gpu.Dockerfile

        * Update ubuntu18.04-devel-gpu.Dockerfile

        * fix issues

        * update

    commit 6ae558e
    Author: ht <[email protected]>
    Date:   Thu Aug 20 23:47:30 2020 +0800

        [FEATURE]Horovod support for training transformer (PART 2) (dmlc#1301)

        * set default shuffle=True for boundedbudgetsampler

        * fix

        * fix log condition

        * use horovod to train transformer

        * fix

        * add mirror wmt dataset

        * fix

        * rename wmt.txt to wmt.json and remove part of urls

        * fix

        * tuning params

        * use get_repo_url()

        * update average checkpoint cli

        * paste result of transformer large

        * fix

        * fix logging in train_transformer

        * fix

        * fix

        * fix

        * add transformer base config

        * fix

        * change to wmt14/full

        * print more sacrebleu info

        * fix

        * add test for num_parts and update behavior of boundedbudgetsampler with even_size

        * fix

        * fix

        * fix

        * fix logging when using horovd

        * udpate doc of train transformer

        * add test case for fail downloading

        * add a ShardedIterator

        * fix

        * fix

        * fix

        * change mpirun to horovodrun

        * make the horovod command complete

        * use print(sampler) to cover the codes of __repr__ func

        * empty commit

        * add test case test_sharded_iterator_even_size

        Co-authored-by: Hu <[email protected]>

commit 1403c6e
Author: ZheyuYe <[email protected]>
Date:   Fri Aug 21 11:15:44 2020 +0800

    update uncased_bert_large

commit 733a4b6
Author: ZheyuYe <[email protected]>
Date:   Thu Aug 20 20:16:39 2020 +0800

    adjust uncased_bert_large

commit 770f079
Author: ZheyuYe <[email protected]>
Date:   Thu Aug 20 15:10:57 2020 +0800

    Revert "merge xingjian's"

    This reverts commit ea1f1aa.

commit fe74dda
Author: ZheyuYe <[email protected]>
Date:   Thu Aug 20 14:07:36 2020 +0800

    update electra small

commit 8972343
Author: ZheyuYe <[email protected]>
Date:   Thu Aug 20 14:00:57 2020 +0800

    add command to readme

commit 8fcde49
Author: ZheyuYe <[email protected]>
Date:   Thu Aug 20 12:30:47 2020 +0800

    revise

commit 7a625c4
Author: ZheyuYe <[email protected]>
Date:   Thu Aug 20 12:21:58 2020 +0800

    update reamde

commit 071c6dd
Author: ZheyuYe <[email protected]>
Date:   Wed Aug 19 17:14:53 2020 +0800

    update bert squad command

commit ea1f1aa
Author: ZheyuYe <[email protected]>
Date:   Tue Aug 18 18:07:01 2020 +0800

    merge xingjian's

commit 859ab4d
Author: ZheyuYe <[email protected]>
Date:   Tue Aug 18 17:47:01 2020 +0800

    dummy example

commit 633e683
Author: ZheyuYe <[email protected]>
Date:   Tue Aug 18 17:36:31 2020 +0800

    list_backbone_names

commit b4aac59
Author: ZheyuYe <[email protected]>
Date:   Tue Aug 18 17:32:51 2020 +0800

    update readme

commit 54301d9
Author: ZheyuYe <[email protected]>
Date:   Tue Aug 18 13:59:06 2020 +0800

    revise batch squad

commit e019e27
Author: ZheyuYe <[email protected]>
Date:   Tue Aug 18 13:58:49 2020 +0800

    bash convert

commit e01eda0
Author: ZheyuYe <[email protected]>
Date:   Tue Aug 18 11:10:51 2020 +0800

    update roberta

commit 1730ff7
Author: ZheyuYe <[email protected]>
Date:   Tue Aug 18 10:15:27 2020 +0800

    revise submit

commit de0b4c9
Author: ZheyuYe <[email protected]>
Date:   Mon Aug 17 16:07:58 2020 +0800

    upload batch files

commit 175de01
Author: ZheyuYe <[email protected]>
Date:   Mon Aug 17 16:05:02 2020 +0800

    fix

commit 0460ed3
Author: ZheyuYe <[email protected]>
Date:   Mon Aug 17 15:48:52 2020 +0800

    upload commands
sxjscience pushed a commit that referenced this pull request Aug 22, 2020
* Squashed commit of the following:

commit 7525618
Author: ZheyuYe <[email protected]>
Date:   Fri Aug 21 11:25:38 2020 +0800

    Squashed commit of the following:

    commit d8b68c6
    Author: Xingjian Shi <[email protected]>
    Date:   Thu Aug 20 08:47:56 2020 -0700

        [Numpy] Fix AWS Batch + Add Docker Support (#1302)

        * Update submit-job.py

        Add LICESE + Examples for batch

        Update docker image

        update

        Update README.md

        Update README.md

        Update ubuntu18.04-devel.Dockerfile

        Update ubuntu18.04-devel.Dockerfile

        Update ubuntu18.04-devel.Dockerfile

        update

        Update ubuntu18.04-devel-gpu.Dockerfile

        fix

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update submit-job.py

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        update

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        update

        update

        Update submit-job.py

        Update submit-job.py

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        try to fix

        fix batch

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update submit-job.py

        Update ubuntu18.04-devel-gpu.Dockerfile

        simplify bert test

        add files

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        Update ubuntu18.04-devel-gpu.Dockerfile

        fix

        Update ubuntu18.04-devel-gpu.Dockerfile

        * Update ubuntu18.04-devel-gpu.Dockerfile

        * try to add back mxnet support

        * Update ubuntu18.04-devel-gpu.Dockerfile

        * Update ubuntu18.04-devel-gpu.Dockerfile

        * update

        * Update ubuntu18.04-devel-gpu.Dockerfile

        * Update ubuntu18.04-devel-gpu.Dockerfile

        * Update ubuntu18.04-devel-gpu.Dockerfile

        * fix issues

        * update

    commit 6ae558e
    Author: ht <[email protected]>
    Date:   Thu Aug 20 23:47:30 2020 +0800

        [FEATURE]Horovod support for training transformer (PART 2) (#1301)

        * set default shuffle=True for boundedbudgetsampler

        * fix

        * fix log condition

        * use horovod to train transformer

        * fix

        * add mirror wmt dataset

        * fix

        * rename wmt.txt to wmt.json and remove part of urls

        * fix

        * tuning params

        * use get_repo_url()

        * update average checkpoint cli

        * paste result of transformer large

        * fix

        * fix logging in train_transformer

        * fix

        * fix

        * fix

        * add transformer base config

        * fix

        * change to wmt14/full

        * print more sacrebleu info

        * fix

        * add test for num_parts and update behavior of boundedbudgetsampler with even_size

        * fix

        * fix

        * fix

        * fix logging when using horovd

        * udpate doc of train transformer

        * add test case for fail downloading

        * add a ShardedIterator

        * fix

        * fix

        * fix

        * change mpirun to horovodrun

        * make the horovod command complete

        * use print(sampler) to cover the codes of __repr__ func

        * empty commit

        * add test case test_sharded_iterator_even_size

        Co-authored-by: Hu <[email protected]>

commit 1403c6e
Author: ZheyuYe <[email protected]>
Date:   Fri Aug 21 11:15:44 2020 +0800

    update uncased_bert_large

commit 733a4b6
Author: ZheyuYe <[email protected]>
Date:   Thu Aug 20 20:16:39 2020 +0800

    adjust uncased_bert_large

commit 770f079
Author: ZheyuYe <[email protected]>
Date:   Thu Aug 20 15:10:57 2020 +0800

    Revert "merge xingjian's"

    This reverts commit ea1f1aa.

commit fe74dda
Author: ZheyuYe <[email protected]>
Date:   Thu Aug 20 14:07:36 2020 +0800

    update electra small

commit 8972343
Author: ZheyuYe <[email protected]>
Date:   Thu Aug 20 14:00:57 2020 +0800

    add command to readme

commit 8fcde49
Author: ZheyuYe <[email protected]>
Date:   Thu Aug 20 12:30:47 2020 +0800

    revise

commit 7a625c4
Author: ZheyuYe <[email protected]>
Date:   Thu Aug 20 12:21:58 2020 +0800

    update reamde

commit 071c6dd
Author: ZheyuYe <[email protected]>
Date:   Wed Aug 19 17:14:53 2020 +0800

    update bert squad command

commit ea1f1aa
Author: ZheyuYe <[email protected]>
Date:   Tue Aug 18 18:07:01 2020 +0800

    merge xingjian's

commit 859ab4d
Author: ZheyuYe <[email protected]>
Date:   Tue Aug 18 17:47:01 2020 +0800

    dummy example

commit 633e683
Author: ZheyuYe <[email protected]>
Date:   Tue Aug 18 17:36:31 2020 +0800

    list_backbone_names

commit b4aac59
Author: ZheyuYe <[email protected]>
Date:   Tue Aug 18 17:32:51 2020 +0800

    update readme

commit 54301d9
Author: ZheyuYe <[email protected]>
Date:   Tue Aug 18 13:59:06 2020 +0800

    revise batch squad

commit e019e27
Author: ZheyuYe <[email protected]>
Date:   Tue Aug 18 13:58:49 2020 +0800

    bash convert

commit e01eda0
Author: ZheyuYe <[email protected]>
Date:   Tue Aug 18 11:10:51 2020 +0800

    update roberta

commit 1730ff7
Author: ZheyuYe <[email protected]>
Date:   Tue Aug 18 10:15:27 2020 +0800

    revise submit

commit de0b4c9
Author: ZheyuYe <[email protected]>
Date:   Mon Aug 17 16:07:58 2020 +0800

    upload batch files

commit 175de01
Author: ZheyuYe <[email protected]>
Date:   Mon Aug 17 16:05:02 2020 +0800

    fix

commit 0460ed3
Author: ZheyuYe <[email protected]>
Date:   Mon Aug 17 15:48:52 2020 +0800

    upload commands

* add mobilebert

* replace remote

* fix branch

* fix typo

Co-authored-by: Yuma1L <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants